Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[low-code cdk] Allow for spec file to be defined in the yaml manifest instead of an external file #18411

Merged
merged 12 commits into from
Nov 7, 2022

Conversation

brianjlai
Copy link
Contributor

@brianjlai brianjlai commented Oct 25, 2022

What

Allows for developers to define the connector specification used in the spec command within the manifest file instead of a separate spec.json or spec.yaml

How

  • Added a new spec component so we can get additional schema validation and make some of our casing more consistent (it still outputs the same format)
  • Update the generator to define spec in the yaml
  • Also fixed all of our commented out tests that weren't really working due to issues w/ how we read temp files and added better testing of edge cases involving how spec gets defined

Recommended reading order

  1. spec.py
  2. y.python

Validation

Once we have this all reviewed I'll migrate either a connector from hacktoberfest or greenhouse to define it in the manifest to verify publishing still works. I think it should since specs in the various mask files are generated from the spec command run on the container and not the literal spec.yaml file itself.

@github-actions github-actions bot added the CDK Connector Development Kit label Oct 25, 2022
@@ -243,7 +243,7 @@ def is_object_definition_with_class_name(definition):

@staticmethod
def is_object_definition_with_type(definition):
return isinstance(definition, dict) and "type" in definition
return isinstance(definition, dict) and "type" in definition and definition["type"] != "object"
Copy link
Contributor Author

@brianjlai brianjlai Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes me slightly nervous, but the connection_specification is effectively just a mapping/dict/object so when we are generating the spec component, it gets read in as an object and type: object gets added the component. Then, when because it has a type we attempt to resolve it in the CLASS_TYPES_REGISTRY which will correctly fail because type isn't in that map.

I'm not sure why we never saw this before, but I spot checked some components and didn't see any fields that were of mapping type so that might be why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. I assume the request options provider's field work because they're RequestInput, not raw Mapping?


spec:
documentation_url: https://docsurl.com
connection_specification:
Copy link
Contributor Author

@brianjlai brianjlai Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that this went from camel case to snake case. I like better consistency with our component definitions across the file and the response of a spec call will still be camel case to retain compatibility which I mention in another comment.

However, it does make it a small gotcha for migrating existing connectors that it needs to be switched, but it will be very clear and obvious since the YAML will fail validation and report that the fields are missing. I not married to either format tho.

spec:
documentation_url: https://docsurl.com
connection_specification:
$schema: http://json-schema.org/draft-07/schema#
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left it here as a reminder, but we can actually remove the $schema from the generator because it will get inferred to the latest version when its not present. I kind of like removing it because right now it exposes an underlying implementation detail of how we validate schemas.

Con is that it becomes a small gotcha for a developer that there could be a mismatch in schema drafts if there is some kind of incompatibility between draft versions when we always use the latest

@brianjlai brianjlai marked this pull request as ready for review October 25, 2022 02:49
@brianjlai brianjlai requested a review from a team as a code owner October 25, 2022 02:49
@@ -69,6 +70,27 @@ def streams(self, config: Mapping[str, Any]) -> List[Stream]:
self._apply_log_level_to_stream_logger(self.logger, stream)
return source_streams

def spec(self, logger: logging.Logger) -> ConnectorSpecification:
"""
Returns the spec for this integration. The spec is a JSON-Schema object describing the required configurations (e.g: username
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Returns the spec for this integration. The spec is a JSON-Schema object describing the required configurations (e.g: username
Returns the connector specification (spec) as defined in the Airbyte Protocol. The spec is an object describing the possible configurations (e.g: username

def spec(self, logger: logging.Logger) -> ConnectorSpecification:
"""
Returns the spec for this integration. The spec is a JSON-Schema object describing the required configurations (e.g: username
and password) required to run this integration. For low-code connectors, this will first attempt to load the spec from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and password) required to run this integration. For low-code connectors, this will first attempt to load the spec from the
and password) which can be configured when running this connector. For low-code connectors, this will first attempt to load the spec from the

@@ -243,7 +243,7 @@ def is_object_definition_with_class_name(definition):

@staticmethod
def is_object_definition_with_type(definition):
return isinstance(definition, dict) and "type" in definition
return isinstance(definition, dict) and "type" in definition and definition["type"] != "object"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the method is_object... should this check be !=?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with how are connector specifications are defined, the type field is an overloaded term. In the manifest, type is just shorthand so we don't have to include the full classpath. But for the connector specification object, type: object is actually part of the spec schema. So the factory gets confused and thinks we need to parse the spec as another declarative component, but we really rant this to remain a Mapping. Without this line, it looks for object in the CLASS_TYPES_REGISTRY

It's a little bit of a hack, but it allows us to make the manifest defined and external defined specs interchangable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianjlai should we rename our type field to something less overloaded as part of lowcode to beta?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this context as a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup will add!


Attributes:
documentation_url (str): The link the Airbyte documentation about this connector
connection_specification (str): information related to how a connector can be configured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the type should be Mapping[str, Any]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops yeah, commented to quickly

options: InitVar[Mapping[str, Any]]

def __post_init__(self, options: Mapping[str, Any]):
self._options = options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we explicitly not store the options to ensure they are not used as part of the spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a good point if we want to treat spec exactly how it is today. will update this to not persist or use options anywhere

spec = self._source_config.get("spec")
if spec:
if "class_name" not in spec:
spec["class_name"] = "airbyte_cdk.sources.declarative.spec.Spec"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the developer uses a type field instead of class_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three formats should work. If they define the class_name then we'll just use that. We define Spec in the class_types_registry so it should work with type, and if it's omitted here then we'll insert class name here.

documentation_url = spec.documentation_url
connection_specification = spec.connection_specification
assert documentation_url == "https://airbyte.com/#yaml-from-manifest"
print(connection_specification)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the debug print

@brianjlai brianjlai mentioned this pull request Nov 7, 2022
37 tasks
@brianjlai
Copy link
Contributor Author

brianjlai commented Nov 7, 2022

/publish-cdk dry-run=true

🕑 https://github.com/airbytehq/airbyte/actions/runs/3413359869
https://github.com/airbytehq/airbyte/actions/runs/3413359869

@brianjlai
Copy link
Contributor Author

brianjlai commented Nov 7, 2022

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/3413427564
https://github.com/airbytehq/airbyte/actions/runs/3413427564

@brianjlai brianjlai merged commit f9863d6 into master Nov 7, 2022
@brianjlai brianjlai deleted the brian/low_code_spec_in_manifest branch November 7, 2022 19:44
letiescanciano added a commit that referenced this pull request Nov 8, 2022
…nent

* master:
  🪟 🎉 Enable frontend validation for <1hr syncs (cloud) #19028
  Stream returns AirbyteMessages (#18572)
  🎉 New Source - Recruitee [low-code SDK] (#18671)
  🎉 New source: Breezometer [low-code cdk] (#18650)
  Check disabled connections after protocol update (#18990)
  Simple default replication worker refactor (#19002)
  🎉 New Source: Visma e-conomic (#18595)
  🎉 New Source: Fastbill (#18593)
  Bmoric/extract state api (#18980)
  🎉 New Source: Zapier Supported Storage (#18442)
  🎉 New source: Klarna (#18385)
  `AirbyteEstimateTraceMessage` (#18875)
  Extract source definition api (#18977)
  [low-code cdk] Allow for spec file to be defined in the yaml manifest instead of an external file (#18411)
  🐛 Source HubSpot: fix property scopes (#18624)
  Ensure that only 6-character hex values are passed to monaco editor (#18943)
Comment on lines +32 to +33
return ConnectorSpecification.parse_obj(
{"documentationUrl": self.documentation_url, "connectionSpecification": self.connection_specification}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianjlai how this will affect existing low-code connectors using the previous format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not, in the event that a spec is not defined in the manifest itself, it will default to using the original way of retrieving and generating the spec from the spec.yaml or spec.json file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants